Indexing Real-World Data using Semi-Structured Documents
نویسندگان
چکیده
We address the problem of deriving meaningful semantic index information for a multi-media database using a semi-structured document model. We show how our framework, called feature grammars, can be used to (1) exploit third-party interpretation modules for real-world unstructured components, and (2) use context-free grammars to convert such poorly or unstructured input to semi-structured output. The basic idea is to enrich context-free grammars with special symbols called detectors, which provide for the necessary structure just-intime to satisfy a parser look-ahead. A prototype implementation has been constructed in the Acoi project to demonstrate the feasibility of this approach for indexing both images and audio documents. 1991 Computing Reviews Classification System: [H2.4,H.3.1] Multimedia indexing
منابع مشابه
Indexing Real - World Data using Semi - Structured
We address the problem of deriving meaningful semantic index information for a multi-media database using a semi-structured document model. We show how our framework, called feature grammars, can be used to (1) exploit third-party interpretation modules for real-world unstructured components, and (2) use context-free grammars to convert such poorly or unstructured input to semi-structured outpu...
متن کاملIndexation des documents XML : Un DataGuide annoté avec un index de contenu
Indexing in classical information retrieval brings few tools for the treatment of the semi-structured documents: the representations of documents in information retrieval were conceived for flat and homogeneous documents. They are not adapted to the simultaneous treatment of the structure and the contents. Several approaches of indexing semi-structured data was proposed to resolve this new chal...
متن کاملSearching web data: An entity retrieval and high-performance indexing model
More and more (semi) structured information is becoming available on the Web in the form of documents embedding metadata (e.g., RDF, RDFa, Microformats and others). There are already hundreds of millions of such documents accessible and their number is growing rapidly. This calls for large scale systems providing effective means of searching and retrieving this semi-structured information with ...
متن کاملDiscovering Frequent Substructures from Hierarchical Semi-structured Data
Frequent substructure discovery from a collection of semi-structured objects can serve for storage, browsing, querying, indexing and classification of semi-structured documents. This paper examines the problem of discovering frequent substructures from a collection of hierarchical semi-structured objects of the same type. The use of wildcard is an important aspect of substructure discovery from...
متن کاملAcoi: A System for Indexing Multimedia Objects
In this paper, we present a system that combines independent feature detector programs with multimedia database technology to provide a semantic rich index to multimedia data items on the World Wide Web. First, we introduce a grammatical framework, called feature grammars, which forms the indexing schema. Feature grammars are an extension of context-free grammars with active symbols (e.g. multi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999